Optimizing Selections over Datacubes
نویسندگان
چکیده
Datacube queries compute aggregates over database relations at a variety of granularities. Often one wants only datacube output tuples whose aggregate value satisfies a certain condition, such as exceeding a given threshold. We develop algorithms for processing a datacube query using the selection condition internally during the computation. Thus, we can safely prune parts of the computation and end up with a more efficient computation of the answer. Our first technique, called “specialization”, uses the fact that a tuple in the datacube does not meet the given threshold to infer that all finer level aggregates cannot meet the threshold. Our second technique is called “generalization”, and applies in the case where the actual value of the aggregate is not needed in the output, but used just to compare with the threshold. We demonstrate the efficiency of these techniques by implementing them within the sparse datacube algorithm of Ross and Srivastava. We present a performance study using synthetic and real-world data sets. Our results indicate substantial performance improvements for queries with selective conditions.
منابع مشابه
Fast Computation of Sparse Datacubes
Datacube queries compute aggregates over database relations at a variety of granularities, and they constitute an important class of decision support queries. Real-world data is frequently sparse, and hence efficiently computing datacubes over large sparse relations is important. We show that current techniques for computing datacubes over sparse relations do not scale well with the number of C...
متن کاملA Systematic Approach for Managing the Risk Related to Semantic Interoperability between Geospatial Datacubes
Geospatial datacubes are the database backend of novel types of spatiotemporal decision-support systems employed in large organizations. These datacubes extend the datacube concept underlying the field of Business Intelligence (BI) into the realm of geospatial decision-support and geographic knowledge discovery. The interoperability between geospatial datacubes facilitates the reuse of their co...
متن کاملAn UML Profile and SOLAP Datacubes Multidimensional Schemas Transformation Process for Datacubes Risk-Aware Design
Spatial Data Warehouses (SDWs) and Spatial On-Line Analytical Processing (SOLAP) systems are new technologies for the integration and the analysis of huge volume of data with spatial reference. Spatial vagueness is often neglected in these types of systems and the data and analysis results are considered reliable. In a previous work, the authors provided a new design method for SOLAP datacubes ...
متن کاملFrom Transactional Spatial Databases Integrity Constraints to Spatial Datacubes Integrity Constraints
Spatial multidimensional databases (also called "spatial datacubes") are the cornerstone of the emerging Spatial On-Line Analytical Processing technology (SOLAP). They are aimed at supporting Geographic Knowledge Discovery (GKD) as well as certain types of spatial decision-making. Although these technologies seem promising at first glance, they may provide unreliable results if one does not con...
متن کاملA Conceptual Framework to Support Semantic Interoperability of Geospatial Datacubes
Today, we observe a wide use of geospatial databases that are implemented in many forms (e.g. transactional centralized systems, distributed databases, multidimensional datacubes). Among those possibilities, the multidimensional datacube is more appropriate to support interactive analysis and to guide the organization’s strategic decisions, especially when different epochs and levels of informa...
متن کامل